44 - Lecture_10_2_Orthogonal_Directions_Descent [ID:39316]
50 von 120 angezeigt

Hi, we talked about gradient descent last time and the problem with gradient descent

was that it tends to exhibit this zigzag behavior, well that may be a bit better than it actually

is, so it does something like that, so zigzag like this, and we are going to talk about

something else which I'm calling orthogonal directions descent.

This is not a method that works in practice, it can't work because it relies on information

that we do not have, but it will still give us the right idea on how to fix this problem

that gradient descent exhibits.

One important piece of information is that gradient descent can in fact converge quickly

if the initial position is set right.

So this is zigzag behavior and that is inefficient.

What's the right initial position?

Of course if we set the initial position at the minimum then we are converged in zero

steps, but there is a way of having converges in one step, and this is done as follows.

If x0 is set such that r0 is an eigenvector of b, then x1, so the position after one step

is identical to the minimum x star.

So this looks like that, these are level sets, so we have zigzag if we start here or here,

sorry you can't see the cursor, if we start here somewhere or here somewhere, but if we

start at let's say at this point, so this is exactly at this half axis of this ellipsoid,

then we go in one step, this bit better, so we start here and then we jump to the minimum

in one step.

So if that is x0, then this is x1.

And similarly if we start here, we also converge in one step.

That's due to the fact that x0 or that r0 is an eigenvector of b, and you can prove

this in the exercises.

It's an elementary computation, it's not super easy to see, but if you do the computation

strike you will see that this is true.

So if we're lucky with our initial guess, gradient descent converges quickly, so in

one step.

But obviously we can't rely on luck, because it is very improbable to exactly choose a

point along this axis or a point along this axis.

Then we might have another idea, which works like this.

Try to replace this zigzag behaviour, so these are supposed to be right angles, by one step

and one step.

So let's say we converge to this point, then we go here.

Sorry, this is also supposed to be a right angle.

So how can we try to do this?

How would we start by replacing this zigzag behaviour by this two-step method?

The idea would be to not stop here, but to go further along this direction and stop at

the right point and then make one step in this direction instead of going along this

direction, the first direction, the second direction, the first direction, the second

direction and so on.

So if we were able to replace these infinitely many steps by just going along this direction

long enough and then along this direction long enough, then we would be exactly at the

point where we are trying to go to get it.

So set d0, d is for direction and we just call this the negative gradient of phi in

x0.

And then set, so we're in two dimensions here, so there are only two directions to choose,

but in n dimensions we of course have d1, dn-1 such that the full set of directions,

including d0, is a basis of orthogonal vectors.

So we can always do that, we choose d0 for example like this and then we just choose

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:18:49 Min

Aufnahmedatum

2021-12-14

Hochgeladen am

2021-12-14 10:26:03

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen